Semantic segmentation based on sparse annotation has advanced in recent years. It labels only part of each object in the image, leaving the remainder unlabeled. Most of the existing approaches are time-consuming and often necessitate a multi-stage training strategy. In this work, we propose a simple yet effective sparse annotated semantic segmentation framework based on segformer, dubbed SASFormer, that achieves remarkable performance. Specifically, the framework first generates hierarchical patch attention maps, which are then multiplied by the network predictions to produce correlated regions separated by valid labels. Besides, we also introduce the affinity loss to ensure consistency between the features of correlation results and network predictions. Extensive experiments showcase that our proposed approach is superior to existing methods and achieves cutting-edge performance. The source code is available at \url{https://github.com/su-hui-zz/SASFormer}.
translated by 谷歌翻译
我们提出了一个基于串联弹性执行器(SEA)的平行按摩机器人,提供统一的力量控制方法。首先,建立了运动和静态力模型,以获得相应的控制变量。然后,提出了一种新型的力位控制策略,以在不需要机器人动力学模型的情况下分别控制沿表面正常方向的力位和另一个两方向位移。为了评估其性能,我们实施了一系列机器人按摩实验。结果表明,所提出的按摩操纵器可以成功实现按摩任务的所需力和运动模式,从而达到高得分用户体验。
translated by 谷歌翻译
半监督学习(SSL)通过利用大量未标记数据来增强有限标记的样品来改善模型的概括。但是,目前,流行的SSL评估协议通常受到计算机视觉(CV)任务的约束。此外,以前的工作通常从头开始训练深层神经网络,这是耗时且环境不友好的。为了解决上述问题,我们通过从简历,自然语言处理(NLP)和音频处理(AUDIO)中选择15种不同,具有挑战性和全面的任务来构建统一的SSL基准(USB),我们会系统地评估主导的SSL方法,以及开源的一个模块化和可扩展的代码库,以对这些SSL方法进行公平评估。我们进一步为简历任务提供了最新的神经模型的预训练版本,以使成本负担得起,以进行进一步调整。 USB启用对来自多个域的更多任务的单个SSL算法的评估,但成本较低。具体而言,在单个NVIDIA V100上,仅需要37个GPU天才能在USB中评估15个任务的FIXMATCH,而335 GPU天(除ImageNet以外的4个CV数据集中的279 GPU天)在使用典型协议的5个CV任务上需要进行5个CV任务。
translated by 谷歌翻译
随着区块链技术的开发,基于区块链技术的加密货币越来越受欢迎。这给出了一个巨大的加密货币交易网络,引起了广泛关注。网络的链接预测学习结构有助于了解网络的机制,因此在加密货币网络中也广泛研究了网络的机制。但是,过去研究中忽略了加密货币交易网络的动态。我们使用图形正则方法将过去的交易记录与未来交易联系起来。基于此,我们提出了一种潜在因子依赖性,非负因子,乘法和图形正规化的已归合性更新(SLF-NMGRU)算法,并进一步提出了图形正则化的非负潜在因子分析(GRNLFA)模型。最后,在真实加密货币交易网络上进行的实验表明,提出的方法提高了准确性和计算效率
translated by 谷歌翻译
弱监督的对象本地化是一项具有挑战性的任务,旨在将对象定位具有粗糙注释(例如图像类别)。现有的深网方法主要基于类激活图,该图的重点是突出显示歧视性局部区域,同时忽略了整个对象。此外,基于变压器的技术不断地重点放在阻碍识别完整对象的能力的背景上。为了解决这些问题,我们提出了一种称为令牌改进变压器(TRT)的重新注意事项机制,该机制捕获了对象级语义,以很好地指导本地化。具体而言,TRT引入了一个名为令牌优先级评分模块(TPSM)的新型模块,以抑制背景噪声的效果,同时重点放在目标对象上。然后,我们将类激活图作为语义意识的输入合并,以将注意力图限制为目标对象。在两个基准测试上进行的广泛实验展示了我们提出的方法与现有方法的优势,该方法具有带有图像类别注释的现有方法。源代码可在\ url {https://github.com/su-hui-zz/reattentiontransformer}中获得。
translated by 谷歌翻译
事情互联网(物联网)正处于重大范式转变的边缘。在未来的IOT系统中,IOFT,云将被人群代替模型训练被带到边缘的人群,允许IOT设备协作提取知识并构建智能分析/型号,同时保持本地存储的个人数据。这种范式转变被IOT设备的计算能力巨大增加以及分散和隐私保留模型培训的最近进步,作为联合学习(FL)。本文为IOFT提供了愿景,并系统概述当前努力实现这一愿景。具体而言,我们首先介绍IOFT的定义特征,并讨论了三维内部的分散推断的流动方法,机会和挑战:(i)全局模型,最大化跨所有IOT设备的实用程序,(ii)个性化模型所有设备的借款强度都保留了自己的模型,(iii)一个迅速适应新设备或学习任务的元学习模型。通过描述Ioft通过域专家镜头重塑不同行业的愿景和挑战来结束。这些行业包括制造,运输,能源,医疗保健,质量和可靠性,商业和计算。
translated by 谷歌翻译
Modern machine learning suffers from catastrophic forgetting when learning new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental learning methods have been proposed to retain the knowledge acquired from the old classes, by using knowledge distilling and keeping a few exemplars from the old classes. However, these methods struggle to scale up to a large number of classes. We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes. Distinguishing between an increasing number of visually similar classes is particularly challenging, when the training data is unbalanced. We propose a simple and effective method to address this data imbalance issue. We found that the last fully connected layer has a strong bias towards the new classes, and this bias can be corrected by a linear model. With two bias parameters, our method performs remarkably well on two large datasets: ImageNet (1000 classes) and MS-Celeb-1M (10000 classes), outperforming the state-of-the-art algorithms by 11.1% and 13.2% respectively.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译